The crime incidence in the US is one of the most discussed issues in the country. Although the Federal Bureau of Investigation (FBI) in its 2018 report found an overall decline in violent and property crimes in 2018, there has been more media releases of increasing crime incidence in the US, especially mass shootings in recent periods. Increased incidence of crime is a threat to public safety and welfare. At the national level, violent crime and homicide rates increased from 2014 to 2016, but rate remain near historical lows compared to rates in the 90’s.
Washington DC saw an increase in murder rate by 35.6 percent in 2018. (Brennan Center for Justice). Between 2017 and 2018, of all the types of crimes, homicide rates in DC increased the most by about 38%, followed by auto theft which increased by 13%. There was a decrease in other crimes such as sex abuse, assualt and robbery.
This report focuses on all reported crimes in the DC metro police system which includes violent crime, theft, arson, assault, homicide, sex abuse, and burglary. These crimes can be categorized into violent crime and property crime. Violent crime refers to murder, robbery, rape and aggravated assault. Property crime includes burglary, larceny-theft, and motor vehicle theft. Murder includes murder and non-negligent manslaughter. Total crime incidence includes all the above.
The rest of this report contains 7 chapters - chapter 2 includes the description of data (source, definition of variables and geographic coverage), chapter 3 includes the crime types and methods, chapter 4 shows the discticts of DC area, chapter 5 describes the spatial distribution of crime in DC, chapter 6 and 7 analyses the relationship between time and crime and and chapter 8 presents a conclusion of the report.
The source data for our exploratory data analysis is a CSV containing crime incident data in DC for 2018. This data was sourced from OpenDC. The CSV contains 33,783 crime incidence with reported data and time of incidence, method/weapon used, offence classification for the crime, the location of the crime (block, ward, neighbourhood, voting_precinct, latitude and longitude), start date, end date and record ID.
Links: Crime_Incidents_in_2018 / Police_Districts
For context, homicide in this report refers to the killing of a person purposely, or otherwise, with a malicious aforethought. Sexual abuse as engaging in or causing another person to submit to a sexual act by force, threat or fear. Arson refers to malicious burning or attempt to burn a property, structure, vessel or vehicle of another person. Robbery refers to the act of taking anything of value from another person by force, fear or violence. Assault can be defined as purposely or knowingly causing serious bodily injury, threatening to do so or engaging in any act that creates a risk of physical injury to another person. Burglary is the unlawful entry into a property with the intent to commit a criminal offence. The report date is the date the offense was reported to the police which may be later than the date the incident occurred (DC metropolitan police department).
In the first part, we will explore the frequency of the main types and methods of metro crime in DC. First, we need to find out what types of crimes often occur on the metro and try to classify these types of crimes. Then, as for the methods of committing crimes, we can divide them into crime with weapon and crime without weapon. So, before we start to explore the relationship of crime and other factors, we need to classify the crime types and crime methods.
Is there any correlation between types of crime and the use of weapons?
## ARSON ASSAULT W/DANGEROUS WEAPON
## 5 1672
## BURGLARY HOMICIDE
## 1419 160
## MOTOR VEHICLE THEFT ROBBERY
## 2393 2024
## SEX ABUSE THEFT F/AUTO
## 274 11609
## THEFT/OTHER
## 14227
There are 9 different types of crimes occurred in DC metro. The above bar plot shows the frequencies of different types of crimes. From the plot, we can see there are 11609 times of THEFT(F/AUTO) and 14227 times of OTHER THEFT. So, the THEFT crime is the most frequent type compared to others and the least frequent type is ARSON, which only happened 5 times.
## GUN KNIFE OTHERS
## 1598 772 31413
By analyzing the method of crimes, we found that some criminals carried weapons, but others do not. Based on whether carrying weapons or not, we can preliminarily judge the risk factor of the type of crime. We found that approximately 7.02% of the crimes are committed with weapons. There are 1598 crimes in which the criminals used a gun, 772 crimes in which the criminal used knife.
##
## GUN KNIFE OTHERS
## ARSON 0 0 5
## ASSAULT W/DANGEROUS WEAPON 639 597 436
## BURGLARY 6 2 1411
## HOMICIDE 122 10 28
## MOTOR VEHICLE THEFT 0 0 2393
## ROBBERY 818 141 1065
## SEX ABUSE 12 19 243
## THEFT F/AUTO 0 0 11609
## THEFT/OTHER 1 3 14223
## $x
## [1] "Offense"
##
## attr(,"class")
## [1] "labels"
To further explore the relationship between types of crime and method of crime, we used frequency tables to show which types of crime the criminals were more likely to use weapons. The result reveals guns were used most in ROBBERY, ASSAULT and HOMICIDE, while knife were frequently used in ASSULT and ROBBERY.
There are seven police districts in Washington, DC, and each police district is divided into three sectors with a sector being an informal grouping of Police Service Areas (PSAs). In the following analysis, we will look at the crimes happened in each police district.
In this session, we seek to investigate the spatial distribution of the crimes in DC metro. We sought to analyze locations of crimes in DC area by category in order to derive insights into the crime frequencies of different area. This dataset only covers basic geographic information about the location of the crime.
What is the distribution of crime types in each police districts?
From this plot, we can see district 2 and district 3 have the highest number of crimes, which are above 6000. District 7 has the lowest number of crimes, which is under 3000. The number of crimes committed in district 1, 4, 5 and 6 range from 4000 to 5000.
To see the distribution of crimes, first we divide the DC area into seven police districts and map the latitude and longitude of each crime. Only from the map, we can see crimes are concentrated in every area. Since lacking the data about the population of each district, we can only analyze the crime frequency rather than crime rate.
By dividing crime types and zooming each district in the map, we can clearly see the distribution of different types of crime in each area. In District 1, most of the metro DC crimes happened in the north and central area. Obviously, THEFTS happened most often in District 1.
In District 2, most of the metro DC crimes happened in the northwest and southeast area. Similarly, THEFTS occurred most frequently in District 2. However, no homicide occured in District 2 compared to other districts.
Crimes are evenly distributed in District 3. THEFTS are also the most frequent crime types which occurred in this area.
Different types of crimes occurred in District 4, even though the THEFTS occur the most, other kinds of crimes including burglaries, robberies and sex abuses happened frequently. It is noteworthy that there had been several ASSAULTS WITH WEAPONS in this area.
Most crimes occurred in southwest and north area of District 5. Not surprisingly, THEFTS are also the most frequent crime types in both District 5 and District 6.
In District 7, metro DC crimes were distributed in the northeast area. THEFTS are the most, and ASSAULTS WITH WEAPON also happened frequently in the area.
Through the above analysis, we observe that there are some crimes with weapons which we consider as dangerous crimes in certain areas. So in order to figure out what is the location distribution of this specific types of crime, we subset the data by selecting crimes method which is gun and map the gun shooting crimes in DC. From the map, we can see most gun crimes are distributed in the east area of DC. It is obvious that crimes with gun occurred least in district 2.
In conclusion, this chapter analyzes the relationship between crime spots and crime types. There are several insights we can derive from this chapter. First, THEFT are the most commom crime types in every police district. Second, crimes types are significant different among each police districts. Third, gun crimes are more frequent in the east of DC than in the west of DC.
The dataset includes specific crime time including date and time in 2018 year, therefore we can derive our insights into it. Firstly, we explore the crime occurrence distribution in different crime time. Secondly, we take a look at the relationship between crime offense and crime time.
Does crime occurrence frequency and crime offense differ by crime time including seasons in one year and time in one day?
Firstly, we add two new columns into the dataset according to the information in “start_date” and group them into “crimemonth” and “crimeseason”(We select Months Dec, Jan and Feb as winter). Later we will use these two variables with frequency of crime occurrence to analyze more.
## Winter Spring Summer Fall
## 7136 7892 9709 9046
The pie chart above shows the crime frequency in different seasons. We can find that the frequency in Autumn and Summer accounts most.
## 1 2 3 4 5 6 7 8 9 10 11 12
## 2521 2180 2321 2407 2742 2866 3166 3334 3158 3279 2868 2941
We take a look at the first bar plot and it shows the crime occurrence in different months in 2018. The tendency of frequency distribution shows central high and two edges low. It appears that during the summer, crime is more likely to occur. The highest crime incidence month is in August and the frequency is 3334.The lowest is in February belonging to winter and the frequency is 2321.
## DAY EVENING MIDNIGHT
## 12150 14394 7239
This bar plot shows the crime frequency in different shift in one day. We divide one day into three parts, which are Day, Evening and Midnight. From the results, we can see that most crime occur at evening and the frequency is 14394. The fewest crime occur in the day and the frequency is 12150, which is lower obviously than another two shift.
This bar plot shows the frequency of different offenses in 4 seasons. Matching with the former part, the offense THEFT F/AUTO and THEFT/OTHER have the most two frequency. And almost all of offenses occur in the four seasons.
To see if the offense the same across different crime time including seasons in one year and time in one day? We do furthuer statiscal inference to explore more.
H0: Offense and crime time for seasons in one year are independent. H1: They are not independent.
We do the chi-squared to test this hypothesis and calculate the p-value. The outputs below are the summary of Chi-squared test and p-value.
##
## Pearson's Chi-squared test
##
## data: contable2
## X-squared = 154.27, df = 24, p-value < 2.2e-16
Since the p-value is small, we reject the null hypothesis that offense and crime time for seasons in one year are independent.
H0: Offense and crime time in one day are independent. H1: They are not independent.
The same as the offense ~ seasons, we use the chi-squared to test the hypothesis and calculate the p-value. The outputs below are the summary of Chi-squared test and p-value.
##
## Pearson's Chi-squared test
##
## data: contable1
## X-squared = 2245.6, df = 16, p-value < 2.2e-16
The p-value here is still small, we reject the null hypothesis that offense and crime time in one day are not independent.
The crime incidence is affected by crime time including different seasons in one year and different time in one day.
From the dataset, we also found that there exists different time gap between the date of report and crime. We found it interesting for the reason that there may exists some relationship between the time difference with other variables including offense and crime time in one day. Therefore, In this part, we will explore more on this.
Does the time difference between report date and crime date have a correlation with offense and crime time in one day?
First, as for the raw dataset only have columns report date and crime date, we need to acquire the time difference and create a new column named ‘time difference’ which is calculated by subtracting column ‘report_date’ from column ‘start_date’.
As there are some ineffective data in the column time_difference, which the value is less than 0, we only select the value difference between 0 and 1000.
Above are the graphs of the time difference statistics grouped by offense in the box-plot, we can find that they are different obviously. The SEX ABUSE offense has the largest time difference. It might because for some specific offense such as victims of SEX ABUSE often struggle with shame and stigma thus hesitating to report it due to self-esteem. However, for the homicide offense, people are more willing to report them in a hurry, so the time difference in it is relatively low. Besides, we found the range of ARSON offense is extremely small, whereas others are relatively large.
To find if there is a relationship between time difference and offense. We use statiscal inference further.
H0: The mean value of the time difference is the same across offense. H1: They are different.
We use ANOVA to test the hypothesis and calculate the p-value. The outputs below are the summary of ANOVA test and p-value.
## Df Sum Sq Mean Sq F value Pr(>F)
## OFFENSE 8 7.785e+09 973181387 187.8 <2e-16 ***
## Residuals 17040 8.831e+10 5182504
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is small, we reject the hypothesis that the mean value of the time difference is the same across offense. The report time after being attacked have relationsHip with the crime offense. Victims seem to be reluctant to report some specific offense such as SEX ABUSE as soon as possible.
Similar with above, firstly we take a look at the time_difference distribution in the different shift of one day. The results are as follows. The range and average of time_difference in midnight is larger than another two. However, in the day, people are tend to report crime as soon as possible, in which the average of difference is the smallest.
To find if there are some relationship between time difference and shift, we continue to do hypothesis test.
H0: The mean value of the time difference is the same across different time in one day. H1: They are different.
We use ANOVA to test the hypothesis and calculate the p-value. The outputs are the summary of anova test, p-value and TukeyHSD.
## Df Sum Sq Mean Sq F value Pr(>F)
## SHIFT 2 1.116e+09 558002878 100.1 <2e-16 ***
## Residuals 17046 9.498e+10 5571941
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Time_Difference ~ SHIFT, data = crimeTimeAndShift)
##
## $SHIFT
## diff lwr upr p adj
## EVENING-DAY 552.53910 455.3952 649.6830 0.0000000
## MIDNIGHT-DAY 512.86239 398.3016 627.4232 0.0000000
## MIDNIGHT-EVENING -39.67671 -149.3451 69.9917 0.6731052
We find the p-value is very small. We reject the null hypothesis that the mean value of the time difference is the same across different time in one day. Since the trend here is not clear, we want to know which pairs are different using TukeyHSD. The p-value for evening-day pair and midnight-day pair are extremely small, so we think they are significant. However, for the midnight-evening, the p-value is 0.67 which is larger than 0.05, we fail to reject the null hypothesis and think that the mean value of time differences are the same between this.
In this section, we do the EDA and ANOVA tests to explore the relationship between time difference and other facors. Firstly, we found that crime offense affects the time difference between report time and crime time. Secondly, the time differences among offense are differently. The SEX ABUSE offense has the largest time difference. Thirdly, the time difference is also related with shift in one day,
There are lots of points we can find from examining the reported crimes in the DC metro police system data.
First, we take a look at the lists of crime offense and crime method. Then we analyze the relationship between crime spots and crime types. We conclude that THEFT are the most commom crime types in every police districts and crimes types are significant different among each polic districts.
Next, for the crime time variable, not only we found the time for crime incidence occur most in summer, especially in August, but also observed there exists relationship between crime offense and crime time, including different time in one day and different seasons in one year.
Finally we uncovered the time difference between report date and crime date, and found that both the crime offense and crime time have relationship with the time difference between crime time and report time.
First, the dataset itself has some limitations. Most of variables in this dataset are categorical or discrete rather than continuous, so our statistical analysis methods are limited. In the final project we can use other models, such as logistic regression, to analyze these categorical variables.
Second, the information in this dataset are also limited. Crime is the core concept in this report, but we lack the information about the population, so we can only use crime frequency instead of crime rate to evaluate the risk in each regions. But crime frequency is actually not an perfectly objective indicators, so we can’t simply conclude that this area is dangerous because of its high crime frequency. In future analysis, we will try to find more objective indicators.
Third, this dataset only contains some basic information about crimes including times and locations. But if we want to further explore what kinds of social factors may lead to these crimes, we may add some other dataset to supplement this dataset. We speculate that other social factors associated with urban crimes include populations, housing price, unemployment, inequality, the rapid pace of urbanization and so on. So in the final project, we will append other data set, such as General Social Survey, to do further exploration.
135, & 244. (n.d.). Crime in 2018: Final Analysis. Retrieved from https://www.brennancenter.org/our-work/research-reports/crime-2018-final-analysis.Metropolitan Police
Department. (n.d.). Retrieved from http://crimemap.dc.gov/CrimeDefinitions.aspx.
Ruth Akor: Introduction, Powerpoint Kaiqi Yu, Zichu Chen: Code Qing Ruan, Zixuan Huang: Results description and conlusion